Convergence in Models with Bounded Expected Relative Hazard Rates* 
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Abstract 

We provide a general framework to study stochastic sequences related to an array of models in 
different literatures, including models of individual learning in economics, learning automata in com- 
puter sciences, social learning in marketing, and many others. In this setup, we study the asymptotic 
properties of a class of stochastic sequences that take values in [0, 1] and satisfy a property that we 
call "bounded expected relative hazard rates." We provide sufficient conditions for related sequences, 
which, compared to the original sequence, either move slowly or slow down over time, that yield con- 
vergence to one with high probability or almost surely, respectively . We a p ply these results to sho w the 
optima l ity of the learning models in iBorgers. Morales, and SarinI (|2004l ). lErev and RothI (|l998l ). and 



Schlad U9M) 
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1 Introduction 

Stochastic sequences that arise in many models from different disciplines exhibit expected hazard rates 
that are proportional to the sequence's current value. Analyses of these models are usually concerned 
with whether these sequences converge towards a certain upper bound. For instance, models of technology 
adoption often satisfy that the change in the fraction of a population that adopts a new technology is 
proportio nal to the prod uct of the current fraction of adopters and the current fraction of non-adopters 
(see, e.g., lYouna (|2009l )). This feature, namely, that the hazard rate of adoption is proportional to 
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the current fraction of adopters, is typically motivated by the assumption that diffusion of technology 



ogy. Several models of 
Furthermore, if this 



i 



requires non-adopters to observe adopters in order to learn about the new techno 
individual learning, imitation, and epidemics, among 
proportionality is constant and the path of the fraction of adopters is deterministic, convergence to full 
adoption is guaranteed. In contrast, when the adoption path is stochastic, a number of things might 
happen that would cause full adoption to fail. For instance, the new technology may be completely 
abandoned at some point in time, or the adoption may lose strength. This paper is concerned with the 
general issue of providing sufficient conditions for stochastic sequences taking values in [0, 1], that exhibit 
submartingality and expected hazard rates that are proportional to the current value of the sequence for 
them to converge to one. 

Our convergence results are cast within a framework that encompasses several setups that appear in 
the literature. In particular, we consider states of the world that correspond to an infinite sequence of 
one-period choices of Nature whose realization is revealed in each period. These realizations determine 
the dynamics of the variables we focus on. We are interested in a convex subset of real vectors that 
we call the configuration space and whose elements are called configurations. These vectors play a role 
analogous to th e prob abilities of choosing each action in standard models of individual learning (see, e.g.. 



Erev and RothI (119981)) . or to the fractions of the population that choose each action in social learning 



models (see, e.g., lEllison and Fudenberd (|l995l )). We focus on the performance of the updating process 
in each configuration. In general, performance may be measured through a linear function defined on the 
configuration space. We call this function the aggregator, and it maps each configuration to a real number 
in the unit interval that may be interpreted as a measure of the performance of the process when such a 
configuration is reached, and that we refer to as the performance measure. Such an aggregator function 
can be very simple. It may correspond to the sum of the probabilities of choosing an optimal action in a 
model of individual learning or the fraction of the population that chooses an optimal action in population 
models. We also consider a different variable, called the state of information, that summarizes the relevant 
information that has been revealed up to each period. The aggregator function does not depend directly 
on the state of information, however, as performance is assumed to be completely determined by the 
configuration. It thus follows that the way the configuration changes in response to the information that 
becomes available plays a critical role in whether an updating rule yields an increase of the performance 
measure over time. The function that determines how the configuration changes is the updating rule. This 
function maps the current configuration, the state of information, and the new information that becomes 
available to an updated configuration. A pair of an updating rule and an aggregator is referred to as 
a system. The properties of the system play a critical role in our analysis, because they determine the 



^The classic Bass' model ( Bass 



1 19691 )) of new product growth in marketing also satisfies this property (see, e.g.. 



Jackson and Yarivl ( 201ll )). 
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performance dynamics. 

We focus on systems such that the performance measure satisfies a property that we call (weakly) 
bounded expected relative hazard rates, or (W)BERHR. This property imposes the expected ratio of the 
hazard rate of the performance measure to the performance measure to be non-negative^ As we shall see 
below, several models of learning in economics, as well as various models from several disciplines, including 
biology, computer science, marketing, and psychology, among others, satisfy this property. This property 
has the intuitive interpretation suggested above in models of technology diffusion, where new adopters 
turn to a new technology when they sample current adopters. Most of such models are deterministic, and 
hence, it is unavoidable that full adoption occurs. In contrast, the models considered in this paper are 
stochastic, and we provide sufficient conditions to rule out nonconvergence to full adoption. This enables 
us to obtain convergence results for this class of models in setups that allow for stochastic dynamics. 

Our first result, Theorem[Tl analyzes the asymptotical properties of a sequence that can be obtained as 
a slow version of a sequence that satisfies WBERHR or BERHR, i.e., sequences such that configurations 
change less in response to the information revealed in each period. We show that the probability of 
achieving optimality, i.e., the probability that the performance measure converges to one, can be made 
arbitrarily high by c onsidering a sequence that is slow eno ugh. This result considerably generalizes 



a previous result by 



Lakshmivarahan and Thathachad (jl976l ) in the literature of machine learning for 



simple learning automata. It also allows us to obtain novel convergence results in different contexts, 
including, for instance, the models of social learning that we discuss below. Theorem [U however, has two 
important limitations. First, optimality, although likely, is not guaranteed. Second, for each certainty level 
of achieving convergence to one, the degree of slowness needed is usually determined by the probability 
measure of the underlying probability space, which, in applications, is typically assumed to be unknown. 
Both these problems are tackled by our second and third results. 

Some sequences are primitively defined in such a manner that they can be interpreted as a slow 
version of the performance measure of some system. This allows us to analyze the asymptotic properties 
of these sequences without looking at their underlying system. We follow this approach in Theorem [2j In 
this result, we consider an extra condition that may be interpreted as requiring arbitrary slowing down 
over time. This yields sufficient conditions for achieving optimality almost surely. Although we provide 
conditions for optimality in a general abstract setup, encompassing a wide array of settings studied in the 
literature, the conditions that lead to convergence are not at odds with some typically presumed features 
of the systems in certain applications. In particular, our condition of arbitrary slowing down over time 
is consistent with the "power law of practice" in models of individual learning (see, e.g., lErev and Roth 



^More precisely, if the performance measure at time t is denoted by Pt, BERHR imposes that the expected value of 
{Pt+i — Pt)/((1 — Pt)Pt) is bounded by a strictly positive random variable, whereas WBERHR allows these bounds to be 
time-dependent and it just imposes them to be non-negative. 
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(| 19981 ) and the references therein). 

Our third general result, Corollary [H provides sufficient conditions for the slow version of a system 
that satisfies BERHR to yield optimality almost surely. These conditions require the system to slow down 
linearly, i.e., the expected change in the configuration, in response to the new information, has to decrease 
linearly over timejf 

Next we turn our attention to applications. In the first application, we consider several models of indi- 
vidual learning with both partial and full information. Within the class of mo dels with partial information 



we co nsider a generalized version of the class of monotone learning rules in 



Borgers. Morales, and Sarin 



(j2004l ) that extends their analysis to a dynamic setting. In this setup, the space of configurations cor- 
responds to the set of vectors containing the probabilities of choosing each action, and the aggregator 
corresponds to the sum of the probabilities of choosing the expecte d payoff maximizing actions . It i s 



Borgers. Morales, and SarinI (|2004l ) 



readily verified that this class of learning models satisfies WBERHR. 
show that this class of learning rules, in expected value, increases the probability of choosing expected 
payoff maximizing actions in the short run, i.e., in each period. Our results significantly strengthen their 
findings. By adding linear slowing down, using Corollary [H we obtain that these learning models converge 
almost surely to the set of payoff maximizing actions. We provide a similar construction in a setting where 
individuals receive full information, i.e., observe both obtained and forgone payoffs. 

In a second application, we show that Th eorem [2] can be used t o obtain an alternative proof of the 



well-known result that the learning model of 



Roth and Erevi (|l995l ) c onverges to payo ff maximization. 



In contrast to prev i ous a rguments appearing in the literature (e.g.. 



Rustichini 



(119991), iBeggsl ([20051), 



Hopkins and PoschI (j2005l )). our argument builds on the properties of the expected relative hazard rates 
of this model ^ Hence, our results provide a different interpretation of the convergence property of this 
model. 

In a third application, we look at models of word-of-mouth social learning. ISchlaei (j 19981 ) characterizes 



a set of imitation rules that allows the average payoff of the population, in expected value, to increase 
in every single period when each individual observes her own and other individual's choices and payoffs. 
These rules are called improving rules. He also shows that for populations with a continuum of individuals, 
the fraction of the population that chooses the optimal action converges to one. For finite populations, 
however, little is known about the probability of convergence to the payoff maximizing actions. Whereas 



^More precisely, the lower bound of the expected value of {Pt+i — Pt)/({1 — Pt)Pt) of the slow version is times the 

lower bound in the original sequence. 

*Some of the previous studies that have analyzed the asymptotic properties of the Rot h-Erev l earnin g model have applied 



incorrectly results of the theory of stochastic approximations (a remarkable exception is iBeggd (|2005l l , whose study of the 
prope rties of this model in decision problems is based on the asymptotic properties of the "attractions' 
( 20051 ) ■ however, show that the results in those studies are, nevertheless, correct. 



Hopkins and Posch 
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continuum populations provide a useful benchmark for analyzing choice dynamics in the context of social 
learning, finite populations are important as well. Most of the time, the subject of analysis in economics 
involves only a finite number of individuals. It would be somewhat discouraging if the positive long- 
run results we know for continuum populations have no counterparts for finite populations. We provide 
sufficient conditions for the probability that finite populations converge to choose the optimal actions 
almost surely. These conditions require, first, introducing an inertial behavior component, and second, 
imposing that changes in the probabilities of choosing each action by each individual decrease linearly 
over time. Put together, these conditions, along with the improving properties previously studied in the 
literature, guarantee that the population eventually converges to choose the optimal action. 



Related literature. Our convergence results for sequences that change arbitrarily slow over time are 
related to findings in the literature of learning in games. In particular, it is well known that in learning 
models where the probabilities of choosing each action are based on assessments of their performance, 
linear slowing down in assessments' adjustment yields l ong-run behavior that corresp onds to continuous- 
time deterministic dynamics (see, e.g., Ch. 4 and 5 in iFudenberg and Levind (|l998l )). It is also known 
that continuous time limits de scribe behavior of discrete time models over a finite number of periods (see, 



e.g., 



Borgers and SarinI (jl997l )). Yet, this literature offers no resu l ts ab out the probabilit y of achieving 



optimality for models such as those in 



Borgers. Morales, and Sarin 



(1200J), or lSchlad (|l998l ^. despite their 



robust submartingality with arbitrary payoff distributions. Our results provide a unifying approach for 
an a,lyzing this issue and filling this ga p . 



Lakshmivarahan and Thathacharl (jl976l ) analyze a learning automaton that repeatedly chooses be- 
tween two actions where only two possible outcomes, failure or success, are possible. They show that a 
learning automaton that satisfies a slightly stronger version of our BERHR condition converges with high 
probability to choose the action that is more likely to yield success, pro vided that changes in the probabil 



i ty of choosing each action are small. Several autho rs use the results of 



197d ^ in the learning automat a literature (see, e.g. 



2012 ) adapt the techniques in 



Lakshmivarahan and Thathachai 



Torkestani and Mevbodil ( 



Lakshmivarahan and Thathachad ( 



20091)) 



Oyarzun and Sarin 



19761 ) to provide convergence of a class 



of learning models to risk averse choice. Their configuration space corresponds to a vector of probabilities 
of choosing each action. These probabilities are a function of a "state of learning" that follows a first-order 
Markov process. The settings in these papers are less general than ours, and their results are implied by 
our Theorem[TJ None of these papers has a counterpart to our almost-sure convergence results, Theorem[2] 
and Corollary [U as the models they analyze fail to satisfy ou r arbitrary slowing-down conditions. 



The paper that is closest to what we he re study is that of 



analyze the asymptotical properties of the 



Lamberton. Pages, and TarresI ()2004l ). who 



Bush and Mostellerl (|l95ll ) model of learning with a varying 



gain parameter. They provide conditions that yield optimality almost surely. Their analysis is tailored 
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to the specific characteristics of the lBush and Mosteherl (Il95ll ) model, however, whereas the generahty of 
our framework allows us to apply our results in different settings. 



2 The model 

In this section, we provide the analytical framework of our analysis, introduce the concept of bounded 
expected relative hazard rates, and show that this condition, in general, is not sufficient to yield optimality 
almost surely. 

2.1 General framework 

States of the world. All possible states of the world are represented by the measurable product space 
(0,7-") = {Yl'^ii^t,®t^iJ~t), where stands for the set of states that may occur at time t G N and is 
equipped with a fj-algebra describing the set of events. Furthermore, let T[o,t] '■= ®\-=iJ^t denote the 
set of all events that may occur up to time t € N and .7^[o,o] ■= 

Information space. The set of all information that ever may be available is denoted by 5^. We call 
an element f € 5^ a state of information. There is no need to specify these states for the results in this 
section; however we assume that there exists a sequence {i?t}^gi!^p of nonempty subsets of ^ representing 
the possible information revealed up to any point in time. 

Configuration space. We consider a convex subset (3 of M^, where D G N. An element o" G © 
in the configuration space determines the state of the process, that we shall distinguish from the state of 
the world, described by the set 0. 

Aggregator. Each configuration cr G 6 is mapped to a measure of optimality through a linear 
function denoted by 2t : (3 [0,1]- We refer to this function as the aggregator and to 21 {a) as the 
performance measure of the configuration a. 

Updating rule. The central point of our analysis is an updating rule described by a sequence 
n = {Ilfljgpj of functions lit : 0^ x S^t-ix© — )> S. A second sequence 11^ = |nf| of functions 
IIj^ : Vtt X i?t-i x6 describes how the state of information is updated. We shall usually omit the first 
argument of lit and Ilf, for sake of notation. 

System. We call a pair of an updating rule and an aggregator, (11,21), a system. 

Probability. We fix a probability measure P, defined on (0, J-"), and we denote by Ej [•] its conditional 
expectation operator and by Pt the conditional probability measure, given -7^[o,t]i for all t G N. 
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We thus study a model that evolves in discrete time. At any point in time, the system updates the 
configuration in a probabilistic way, according to the information that becomes available, depending on 
the state of the world. Then, the aggregator maps the new configuration to its performance measure. 



The individual and social learning models that we consider in the applications' section correspond 
to particular examples of this framework. In these models the states that occur at each time t G N 
determine the actions chosen by the individuals, the obtained and forgone payoffs, and the information 
revealed to each individual. The probability measure shall have two components; the first one determines 
the distribution of a technical randomization device that is used to model the probabilistic choices of 
individuals, whereas the second one describes the environment in which individuals live. The state of 
information typically corresponds to a function of the information, up to time t € N, that has been 
revealed to each individual, with a suitable updating rule. The configuration space typically is a set of 
vector tuples in the simplex of actions the individuals make choices from. These vector tuples correspond 
to the probability of choosing each action for each individual. The aggregator maps those vectors to the 
probability of choosing an optimal action, in models of individual learning, and to an average probability 
of choosing optimal actions, in models of social learning. The updating rule is the key ingredient of the 
model and it is determined by a function that maps each individual's observed part of the state of the 
world, previous information, and the current probability of choosing each action to an updated probability 
of choosing each action. 

The distinction between configurations and states of information allows us to isolate what determines 
the performance of the system from potentially richer descriptions of the information accumulated by the 
system. The example of the Roth-Erev model of individual learning analyzed below makes this distinction 
clear. In that model, the configuration is the vector of probabilities of choosing each action, whereas the 
state of information correspond to the vector of "attractions" of each action. 

2.2 Bounded expected relative hazard rates 

In order to get an intuitive idea of the class of sequences that we analyze in this paper, it is useful to 
consider a bit more form ally the interpretation borrowed from the literature on adoption of technology 



(see, e.g., lYoung) ()2009l )). which we mentioned in the introduction. Let Pt G [0,1] denote the fraction 
of a continuum population that has adopted a new technology by time t G Nq. In models of technology 
adoption, the change in the fraction of the population that adopts the new technology is given by the rate 
at which adoption occurs when a non-adopter at time t observes an adopter. If only one other individual 
is observed and the probability of observing other individuals in the population is uniform, the fraction of 
time t non-adopters who observe time t adopters is Pj (1 — Pt). Only some of them are assumed to adopt 
the new technology in t + 1. In the spirit of survival theory, we call this fraction of (net) new adopters. 
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namely {Pt+i — Pt) / {Pt (1 — Pt))^ the relative hazard rate of P := {-Ptj^gp^QU The relative hazard rates 
of several models that we analyze below are random. The objective of this paper is providing conditions 
on the expected value of the relative hazard rates for P to converge to one. 

Some of our applications involve models of individual learning where this interpretation does not 
directly apply. We shall see, however, that such models generate behavior that is related to the dynamics 
of the models of technology adoption we mentioned above. We allow the relative hazard rates to be 
stochastic, and we shall focus on sequences whose expected relative hazard rates are bounded from below. 

Definition 1. We say that a system (H, 21) satisfies the weakly bounded expected relative hazard rates 
property (WBERHR) with lower hound sequence 6 := {6t}ti=No, where 6t > is J^^Qij-measurable, if 

[21 {Ut+i (f, a))] - 21 (a) > • 21 {a) (1-21 (a)), (1) 

for all f € ^t, c € ©, and t G Nq. We say that (11,21) satisfies the bounded expected relative hazard 
rates property (BERHR) if it satisfies WBERHR with lower bound sequence 6 := {(5t}teNo such that 

We consider a random sequence P := {Pt}^^^^^ with Pt := 21 (fit), at = Ilt{ft~i,o-t-i) and ft = 
nf (ft-i, ct-i) for all t G N, and fo € 5o and ctq G 6 exogenously given. If (11,21) satisfies BERHR or 
WBERHR, then P is a bounded submartingale and hence, there exists a random variable P^o, such that 
limj-|-oo Pt = -foo almost surely. Our analysis is mainly concerned with optimality, i.e., the event {Poo = 1}, 
and the properties of a system that yield this event to occur with high probability. 

2.3 Examples for the lack of almost sure optimality 

BERHR does not guarantee almost-sure convergence of P to one; that is, usually P(Poo 7^ 1) > 0. Yet, as 
Lemma El in Appendix Rl reveals, if (n,2t) satisfies WBERHR with lower bound sequence 5 = {St}t£No, 
such that Ylt^o = oo almost surely, then P^o G {0, 1} for all fo G and (Tq G S. 

To provide a first simple example for the lack of almost-sure convergence, consider & = [0,1],^ = {0}, 
the updating rule H defined by the sequence of random variables llt{0,cr) taking values and 1 with 
probability (1 — a)/2 and (1 + cr)/2 for all cr > 0, and llt{0,0) = 0; and the aggregator 21 defined by 
2l(c7) = a\!\ It is clear that (n,2l) satisfies BERHR with lower bound sequence 5 = {6t}teNo given by 



^The term "relative hazard rate" has been coined by 



Young (|2009l ). who analyzes the empirical relevance of alternative 



models of social learning regarding their predictions for the dynamics of the relative hazard rates over time. 

®We emphasize that the BERHR property does not require that inftgNo{<5t} is uniformly (in uj £ bounded away from 

zero. 

^In order to fully describe the system we need to specify the set of states of the world. Here, for example, one could 
set fit = [0, 1] and define the probability measure P as the product measure of uniform measures on Qt- Then, the full 
specification of the updating rule is Ilt{ijJ,0, a) — l{ij<(i+o-)/2} for all a > and nf(tj,0, 0) — 0. 
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6t = 1/2 for all t € Nq and that Pqo = -Pi) which is zero with probabihty (1 — cto)/2. Thus, we have 
optimahty only with probability (1 + cjo)/2. 

The last example is degenerate in the sense that the system does not move after time t = 1, when P gets 



absorbed at either zero or one. As we illustrate now, by adapting an example in 



Viswanathan and Narendra 



(|l972l ). even systems 



• that satisfy a strong version of BERHR, so that the lower bound sequence is uniformly (in a; G 0) 
bounded away from zero, and 

• that have a performance sequence P that never gets absorbed, to wit, Pt G (0, 1) for all t G No, 

may not satisfy optimality almost surely. Towards this end, we set ^It = {oj = {u}i,u}2, uj^) G {0,1}^ x 
[0, 1]}, Tt the corresponding Borel cr-algebra, S = [0, 1],^^ = {0}, 21(0") = a, and 

Ut{uj, 0, cr) = o- + (1 - /?) ((1 - cr) l{^3<^^^2=i} - (Tl{^^ya,Loi=l}) (2) 

for all t G N, w = (a;i,a;2,i^3) G 0*, and o" G 6 and some exogenously given constant /3 G (0, 1). 

We fix /Lii,/i2 G (0,1) with /ii < /i2 and assume that the underlying probability measure P is the 
product measure of the measures P^, defined on (fi^, J"^), which are distributed as a Bernoulli random 
variable in the first component with parameter fii, as a Bernoulli random variable in the second component 
with parameter /Z2, and uniformly distributed in the third component, such that the three components 
are independent; to wit, 

Ftioji = a,uj2 = b,u;3 < c) = /i?(l - /ii)i'V^(l - fi2)^-''c 

for all t G N, a, 6 G {0, 1} and c G [0, 1]. 

This system describes a learning automaton that, at each time t, chooses one out of two actions and 
observes the realization of a failure or success, encoded by uji and u}2- The first action succeeds with 
probability //i, the second one with probability /i2- The probability of choosing the first action is 1 — cr, 
the probability of choosing the second action is cr. If the observed realization is a failure, the configuration 
a is unchanged. If the first action is chosen and a success is observed, then the configuration a is decreased 
by (1 — f3)a. Analogously, if the second action is chosen and a success is observed, then the configuration 
a is increased by (1 — /3)(1 — cr). Thus, observed successes increase the probability of observing the same 
action in the next period. 

It is straightforward to verify that this system verifies BERHR with constant lower bound (1 — /3)(/X2 — 
Hi) > 0. Nevertheless, it can be shown that, with strictly positive probability, each action is chosen at 
all times and therefore the event {Poo = 0} has positive probability. For the exposition of the argument, 
it is useful to partition the time in "blocks," i.e., finite number of consecutive time periods. Then, the 
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probability of, starting at t = 1, choosing action 1 in a row, at all times, in blocks of one time, two times, 
N times, without choosing action 2 between blocks, and with at least one success in each block is 
bounded from below by 

TV TV 

n(i-/3^-vo)^-n(i-(i-^i)')- (3) 

This follows from two facts. First, the probability of obtaining at least one success in a block of length j, 
given that action 1 is chosen always in that block, is 1 — (1 — /Ui)-', for all j G N. Second, the probability of 
choosing action 1 at all times in a block of length j, after choosing action 1 in a row, at all times starting 
at t = 1, in blocks of one time, two times, j — 1 times, without choosing action 2 between blocks, and 
with at least one success in each block, is at least (1 — /3^~^aoy , for all j € N. 

We now argue that the limit of the expression in as N tends to oo, is strictly positive, which, 
directly yields P(Poo = 0) > 0. Since ^Jl^ (1 - ^i)^' < oo, we have that i(l - (1 -^i)^) > olfl Finally, 
the inequalities (1 — f3^~^aoy > 1 — jf3^~^crQ > for all large enough j imply that the first product in ([3]) 
converges to a strictly positive number since i/^"'"^ < oo by the ratio test. 

Intuitively, when an action is chosen initially and sufficient successes are observed, there is a positive 
probability that this action is always chosen. As we shall see in the next section, systems where the 
adjustment of the configuration is slowed down rule out this possibility and guarantee that P converges 
to one almost surely. 

3 Convergence results for slow versions 

In this section, we analyze the asymptotic properties of the performance measure of systems that satisfy 
either BERHR or WBERHR. In particular, we provide sufficient conditions for slow versions of these 
systems, in a sense that we make precise below, to yield optimality almost surely. As we now explain, 
the argument works with a gradual slowing down of the changes in the configuration prescribed by the 
updating rule. 

In order to introduce slow versions of the systems, we first introduce the concept of a slowing sequence, 
i.e., a sequence 6 := {^fjfgMo such that 6t G (0, 1] is ^"[0^4] -measurable. For a given slowing sequence 9 and 
updating rule 11, we define a new updating rule, denoted by 11^ = jn^j^^j,^, as a sequence of functions 
nf : $7t X dt-i X 6 ^ 6 such that 

Uf{;;a) :=a + et^i{Ut{;;a)-a) (4) 

for all 0" G © and t G N. We say that the updating rule 11^ is a slow version of 11 and that the system 
(n^,2l) is a slow version of (11,21). 

*We use that, for aj < 1, 11^^1(1 — '^i) converges to a strictly positive number if flj converges absolutely, which 

follows from taking logarithms and the limit comparison test. 
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Let = {i-f l^gj^^j be defined analogously to P = {Pt\tei% using the same updating rule 11'^ for 
the state of information. To wit, {P?],^^^^ := {21 (af ) }^^^^, 4 = n?(ff_i, and f? = nf (ff.^, af.^) 

for all t e N, and = fo € and ctq = do € © are exogenously given. If (IT, 21) satisfies WBERHR, 
so does (n^,2l) (for details, see Lemma [5] in Appendix[X|). Furthermore, if (H, 21) satisfies BERHR and 
inf^gNoi^t} > 0) (n^)2l) satisfies BERHR as well. Hence, as before, we can define P^ := liuit^oo Pt for 
(n^,2l). We shall say that P^ is a slow version of P. 

Before turning to the results, we shall remark that the definitions of slow versions of updating rules 
and systems suggest a further motivation for the interpretation that we give to states of information and 
configurations along the paper. Whereas configurations change less in the slow versions of an updating 
rule, the states of information are updated just as in the original model. In other words, the slow 
versions and original systems differ in how these systems respond to new information, and hence in their 
performance measure, but not o n how information is updated. 



Building on an argument by 



Lakshmivarahan and Thathachari ()1976l ). we obtain the following result: 



Theorem 1. Suppose the system (n,2l) satisfies WBERHR with lower bound sequence 5 = {St}t£Noi 
Xlt^o — ^ ^'^^ 21 (fJo) > 0. Then, for all e > the sequence 9 = {9t}teNQ with 9t := (1 A (5j) • c S (0, 1), 
where c is a constant depending only on ctq and e, satisfies ^{P^ = 1) > 1 — e. Furthermore, for 
any sequence t] = {i]t}teNo, where rjt € (0,1] is T[Q^t] -measurable and satisfies Ylt^o^t^t = oo, we have 
P(P5f = 1) > 1 - e. 

Intuitively, the first assertion of this theorem allows us to provide an arbitrarily high lower bound 
for the probability of yielding optimality by making the changes of the configuration small enough for 
each arrival of new information. The second assertion establishes that further slowing down cannot make 
achieving optimality less likely than the mentioned lower bound. Since BERHR implies WBERHR, this 
holds for systems that satisfy BERHR, as well. 

The first part of the argument relies on the fact that if (n,2l) satisfies WBERHR and 6 = {6t}t£no is 
non-square-summable, then Pqo S {0, 1} a nd P^ £ |0, 1|. Non-squa re-summability plays a role analogous 



to that of the non-shrinking condition in 



rest of the proof of Theorem [T] follows the ideas in 



Ovarzun a,nd SarinI (120121). vet it is much le s s rest rictive. The 



Lakshmivarahan and Thathachari (jl976l ). Since our 



result, however, is more precise and holds in more general settings, our argument needs to be considerably 
more elaborated. We provide an informal discussion here and the full proof in Appendix lAl 

The proof of Theorem [1] is based on the idea of applying an increasing, concave function (p : [0, 1] — > 
[0, 1] with 0(0) = and = 1 to the submartingale P such that (l){Po) > 1 — e. The new sequence 
{(f){Pt)}f^^^, in general, is not a submartingale. However, WBERHR yields a positive lower bound on 
the expected differences Pt+i — Pt and (p is locally approximately linear. Applying (j) then, to the slow 
version P^ , which, in each step, only moves in a small neighborhood, corresponds to applying an almost 
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linear function to a submartingale with a positive lower bound on its expected growth. Putting these 
ideas together and making them rigorous then yields that {'PiPt)}^^^^^^ is indeed a submartingale. Given 
that G {0, 1}, one then obtains the statement, that is, P{P^ = 1) > 1 — e. The main ideas of 
this discussion are contained in the proof of Lemma [71 Therein, however, instead of constructing a new 
submartingale {(/){Pt)}^^^^^, we are constructing the components of a supermartingale, using the same 
ideas, as this approach simplifies the technical arguments for Theorem [1] and the assertions below. 

Taking a different point of view, slow updating leads to a higher probability of converging to one due 
to the lack of additivity of standard deviation. In slow updating, one updating step is replaced by several 
updating steps. While the increase in expectation is additive, the standard deviations of the corresponding 
steps add up only subadditively; thus the relation of increase in expectation and corresponding standard 
deviation changes; and, therefore, raises the probability of converging to one. 

The results provided so far consider a system and slowing sequences that can turn this system into a 
slow version that is likely to yield optimality. Some systems are primitively defined in such a manner that 
they can be interpreted as slow versions of others systems with no need of introducing a slowing sequence. 
This allows us to analyze the asymptotic properties of P = {PfjfgNo without looking at the underlying 
system. Our next result follows this idea yielding a sufficient condition for P to converge to one almost 
surely: 

Theorem 2. Consider an {T[Q^t]}teNo-0'dapted stochastic process P = {Pt}t£mo with Pt € [0, 1] and Pq > 
satisfying 

1. "Non-summable relative hazard rates: " The sequence P satisfies 

Et[Pt+i] - Pt > StPtil - Pt) (5) 
for some J^^Q ij-measurable random variable 6t > 0, for all t S Nq, with Xlt^o ~ 

2. "Slow version with strictly positive relative hazard rates: " There exist a strictly positive random 
variable 6 > and a sequence 9 = {9t}teNa of almost surely decreasing T^Q^-measurable random 
variables Ot G (0, 1] such that Pt + {Pt+i - Pt)/Ot G [0, 1] and St > 0t6 for all t G Nq. 

3. "Arbitrary slowing down over time: " The stopping time p, defined as 

p := min{t G Nq : -P* > yOt} with min0 := oo, (6) 
is almost surely finite for all y G M. 
Then, hm^^oo Pt = ^■ 

This result requires, first, that P satisfies a non-summable version of WBERHR in which the system 
is not explicitly specified. Second, it requires that P can be interpreted as a slow version, with a slowing 
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sequence 0, of a fictitious sequence that we do not explicitly define, but whose changes in each period 
are given by {Pt+i — Ptj/Ot- This fictitious sequence also satisfies ([5]) with 5t replaced by (5, for some 
(5 > 0. Finally, we require that the fictitious slowing down sequence 9 eventually gets arbitrarily small 
compared to the sequence P. The first two conditions, "Non-summable relative hazard rates" and "Slow 
version with strictly positive hazard rates," seem natural given our previous analysis since they provide 
an interpretation of P as a slow version of the performance measure of a system that satisfies WBERHR. 
The third condition, "Arbitrary slowing down over time," may seem more restrictive. However, as we 
shall illustrate below, in Subsection 14. 1 .31 some models such as the Roth-Erev learning model satisfy all 
these properties. We thus provide an argument different from those appearing in the literature to prove 
the optimality of this model. 

We remark that the condition in ([5]) is weaker than the WBERHR condition, as the later has to hold 
for all configurations cr € ©. We can avoid this stronger assumption as here we do not construct a new 
sequence P^ but only interpret P, step by step, as a slowed down version of some other fictitious sequence. 

The proof of Theorem [2] is similar to the one of Theorem [1] and can be found in Appendix [Al In 
the setup of the informal discussion of the proof of Theorem [1] above, we now slow down a fictitious 
sequence more and more, which allows us to increase the concavity (and thus the value of 0(-Pt)) in each 
step, without loosing the submartingale property of the process {(j){Pt)} We slow down this fictitious 
sequence until the submartingale is greater than 1 — e, for a given but arbitrary e. That this happens 
almost surely in finite time is guaranteed by the assumption of arbitrarily slowing down over time. From 
this point on, the proof then follows the one in Theorem [TJ 

Theorem [1] is limited in the sense that it does not provide sufficient conditions for optimality almost 
surely. More importantly, we need to know the probability measure P or, more precisely, the sequence 
5 to pin down the slowing sequence 9 that guarantees any given confidence level of achieving optimality. 
However, P typically is assumed to be unknown in applications. These problems are taken care of by 
Corollary [H where we consider a reciprocally linearly decreasing slowing sequence: 

Corollary 1. Suppose the system (n,2t) satisfies BERHR and 21 ((Tq) > 0. Then the slowing sequence 
9 = {9t}tmo' '^e^e^ ^* = + 2), satisfies F{P^ = 1) = 1. 

Proof. For the sequence P^ of the statement, we check that the assumptions of Theorem [2] are satisfied. 
First of all, by Lemma [5] in Appendix |A] and the fact that X^^q = ^ almost surely, we obtain that P^ 
satisfies the non-summable relative hazard rates property. The slow version property is obvious by the 
definition of P^ . Finally, Lemma [9] in Appendix El yields the arbitrary slowing down property of P^. □ 



Recall that in the example by 



Viswanathan and Narendral (|l972l ) in Subsection \2.'6\ optimality may 



not occur because the probability of choosing one action once, then twice,..., then N times in a row 
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with at least one success in each block does not go to zero when tends to oo. This implies that the 
probability of choosing only one action (any of them) all times is strictly positive. Corollary [1] reveals that 
this would not be the case if we replace 1 — /3 by (1 — /5)/(t + 1) in the underlying updating function given 
in ([2]). The slowed speed of adjustment of the configuration prevents the automaton of choosing always 
the suboptimal action with strictly positive probability. Similarly, as Theorem [1] reveals, that probability 
could be made arbitrarily small if one replaces 1 — /3 by 9(1 — (3) for some sufficiently small constant > 0. 

4 Applications to learning 

In this section, we apply the asymptotic results of the last section within a framework that nests several 
models of individual and social learning. 

Notation. We consider a finite set W of individuals that we call the population. These individuals 
choose actions in some finite set A that yield some payoff in a bounded set X := [a^min, 2;max]) with 
—OD < Xjnin < 2;max < OO. Formally, if individual i G W chooses action a £ A := {!,..., A^} at time 
t € N, she obtains a payoff, denoted by x^'\a) or just x[^\ We denote individual i's chosen action at time 

(i) 

t hy al . Individual i may only observe the payoff she obtained or she may observe both the obtained 
and forgone payoffs, i.e., the whole profile {x[^\a)}a^A- In these cases it is said that the individual has 
partial information or full information, respectively. We will consider these two possibilities below when 
we analyze models of individual learning. 

Since we also consider applications to social learning, we may allow individuals to observe the chosen 
action(s) and obtained payoff(s) of other individual(s) in the population as well. The set of individuals 
that i £ W observes at time t G N (including her self) is called the sample and it is denoted by s[^\ 

(s) 

The profile of actions chosen by the individuals in a sample s at time t € N is denoted by al for all 
(non-empty) s € V{W), where V{W) denotes the set of all subsets of W, and the corresponding profile 
of payoffs is denoted by x['^\a[^''), or simply by xl^'^I^I 

Behavioral rule. The choices of each individual i € W are described by a probability distribution 

over the set of actions, denoted by (t|*^ G representing the likelihood of her choosing each action 

a G j4 at time t + 1 G N. Upon observing the new information at time t G N, e.g., obtained payoffs or 

other individuals' choices and payoffs, the probabilities of choosing each action are updated according to 

a function called the behavioral rule, denoted by L^^ . In general, the behavioral rule is a function of the 

^In principle, the analysis here could allow for more possibilities. For instance, individuals could observe the forgone 
payoffs of other individuals as well. Our choice of the level of generality of the analysis reflects our attempt to capture the 
main elements that drive the results. 
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information available to individual i at time t, and its specification varies across the different learning 
models we consider. 

General framework. We now specify the general framework introduced in Subsection 12.11 for the 
learning setup. 

• Within this setup, the set of states that may occur at time t G N, corresponds to the set = 
[0, l]l^l X (A:*!"^! X 'P(VF))'^', where [0, l]'^' corresponds to all possible realizations of a random- 
ization device that, as described below, determines the actual choice of each individual given her 
probabilistic rule of choice. The other components of the state correspond to the obtained and 
forgone payoffs, {x|*^(a)}jgM/,aeA, and observed samples, {s|*^}ieiy. We denote the corresponding 
Borel fj-algebra by J-j. 

• The set of states of information at time t S Nq, 5^t, corresponds to possible past information (or 
some of its summary statistics) that an individual might have obtained. For example, in the Roth- 
Erev learning model, which we discuss in Subsection I4.1.3|, a state of information is the vector of 
"attractions" the individual uses to determine the probability of choosing each action. 

• The configuration, a, corresponds to the probability of choosing each action in models of individ- 
ual learning. In models of social learning, a corresponds to a profile of vectors of probabilities 
{a^^\a))i^w,aGA', here a^^\a) represents the probability of choosing action a by individual i. Thus, 
in either case, & = (A(j4))'^', with \W\ = 1 in models of individual learning. 

• In order to introduce the aggregator, we first fix, for each i € W, a set A* C A oi optimal actions. 
For instance. A* may be the set of expected payoff maximizing actions of i. This set is determined 
by the environment, i.e., the underlying probability distribution P, which we shall discuss below. 
The aggregator function is then defined as 2t : S — ?> [0, 1], with 

' ' ieW aeA* 

for all 0" G ©. 

• The updating rule at time t G N, Ilf, in the models of individual learning corresponds to the 
behavioral rule and, in the models of imitation, it is determined by the profile of behavioral 
rules {Lj*^}jgvy- The information updating rule at time t G N, Uf, is defined specifically in each 
setup. 

• It remains to specify the probability distribution P over (0, J^). First of all, we rewrite Q as the 
product space of two spaces Jl'^ and (along with the corresponding c-algebras); more precisely, 
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we write 



oo oo oo 



t=i t=i t=i t=i t=i 

with := [0,1]'^', ■■= ('^''^' X V{W))^^[ and Qt = x Of for all t e N. We will always 
assume that P is the product measure of two probability measures; to wit, P = X P^, with P^ 
and P^ defined over the respective spaces. In particular, the two components corresponding to each 
state of the world, are independent. 

— The first component of P, P^, provides a technical randomization devicqlj that links the 
updating rule to the probabilistic interpretation of a configuration. To be precise, P^ is a 
product measure (corresponding to different times) of product measures (corresponding to 
different individuals) of \W\ uniformly distributed random variables over [0,1]. Each o" € S 
can be interpreted as a matrix of parameters of a family of \W\ independent multinomial 
distributions over the set of actions A. Each individual i partitions the interval [0, 1] in \A\ 
intervals whose measures are the elements of 

(^))aeA' "^^^^^ realization of the 
randomization device determines that action a is chosen if that realization is contained in the 
interval corresponding to that action. Formally, u Qt (or more precisely, the components 
of uj in ilf ) and a G © determine the profile of choices {af^jigVF = {o-t^^i^ , (^^^^)}iew at each 
time t G N. 

— The second component of P, P^, specifies the environment in which individuals live, such as 
the distribution of the payoffs. It satisfies different assumptions that differ among the different 
setups that we discuss below. Individuals are not assumed to know P^. In particular, they do 
not know the distribution over the payoff profile. We denote by x[^^ = a;|**'*(a^*\a;) the profile 
of payoffs obtained at time t G N by a sample of individuals s if they choose the actions a^*^ 
and the state of the world at t is oj G r^t. 

4.1 Individual learning 

In this subsection, we focus on individual learning, thus, \W\ = 1. At each time t G N, the individual 
observes the payoff she would have obtained from any action if she had chosen such an action or some part 
of this information. The behavioral rule, denoted by L^l^ is a function mapping the current probabilities 
of choosing each action at time t G N, past information available to the individual, and the observed part 
of the vector of actions and corresponding payoffs, to the probability of choosing each action at time t + 1. 



^°The use of randomization devices is standard and well understood. We introduce this explicitly here to make sure our 

notation in the sequel is transparent. 

^^For the analysis of individual learning we omit the superscript (i). 
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Two cases that we consider below are partial information, i.e., where the individual only observes the 
payoff of the action she chose; and full information, i.e., where the individual observes the payoff that 
she would have obtained with each action. As a third example, we study the Roth-Erev learning model 
within our setup. 

4.1.1 Partial information 

First, we consider a model of individual learning with partial information where the individual only 
observes the payoff she obtained from the action she chose at each time t € N. We thus assume that 
5o is an arbitrary singleton, = x x ^Yi and nf(a;,f, cr) = f x {at{u},a),xt{at{uj,a),uj)) for all 
ijJ £ 0,t, f € dt-i, cr € A(A), and t € N, where {at{uj,a),xt{at{ijj,a),u})) £ Ax X correspond to the chosen 
action and obtained payoff. The action the individual chooses is determined by the randomization device 
described aboveo The behavioral rule here is specified as 

Lt-.AxXx ^t-i X A(^) ^ A(^) = 6, 

for all t G N. We then have that 

Utiuj, f, cr) = Lt (atiu), a),xt{at{u}, cr), u)),f, a) (8) 

for all t G N,cj G ^t,f G dt-i and cr G S. We need not make any further assumptions on P^, the 
probability measure that specifies the environment. 

In this application, we consider a class of behavioral rules called monotone learning rules. They are 
defined as those that lead to an increase in the expected conditional probability of choosing an e x pecte d 



payoff maximizing action. These rules have been characterized in iBorgers. Morales, and SarinI ([200J), 
who consider a one-period model and specify behavioral rules "locally," i.e., for a fixed configuration a in 
the interior of A(j4). We extend their setup in the natural way and observe that their characterization 
yields that a learning rule L := {Lt}^^^ is monotone if and only if there exist sequences of functions 
: dt-i X A(^) ^ IR)e,rfgA},^j, and : ^t-i x A(^) ^ 1^)^,^^^},^^, ^^ch that 

Lt{a, X, f, a)a = cr{a) + (1 - a{a)) (^a,a,t(f, cr) + Ba,a,t{f, cr)x) , (9) 
Lt{b, X, f, a)a = cr{a) - a{a){Ab^a,t{f, cr) + Bb,aAf^ ^)^)' (10) 

where the coefficients Ah^a,t{f,cr) and Bb^a,t{f,cr) satisfy 



^^We emphasize again that we do not strive for the most general formulation. One might, for example, consider information 
sets that contain the payoffs of unobserved actions, their average value, or more general, ft could be a random variable that 
is measurable with respect to J'[o,t] ■ The analysis of this section then would work out in the same manner. 
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for alH e N, f e dt-i, cr G A(A), a G A, 6 G A \ {a}, and x e X; and there exist 6 G C and a e A\C, 
such that -B;,,a,t(f) o") > for aU non-empty strict subsets C of ^. 

We now provide bounds for the re lative hazard rates of the probabihty of choosing optimal actions. 



As 



Borgers. Morales, and SarinI (|2004l ) show, 



Et[CTt+i(a)] -at(a) = at(a)^5fc,,,t+i(fi,at)(Jt(6)(Et[a;t+i(a)] -Et[xt+i(6)]) (11) 

for all a G ^ and t G Nq. It follows that the updating rule determined by this class of behavioral rules 
and the aggregator function defined with the expected payoff maximization criterion define a system that 
satisfies the WBERHR condition: 

Lemma 1. Let 

A* := {a A:¥.t[xt+i{a)]>¥.t[xt+i{h)] almost surely yb e A,\/t £ Nq} ■ (12) 

Then, the system (11,21), defined by ([7]), ([9]), and (llOp satisfies WBERHR, with lower bound sequence 
6 = {5t}teno, where 

dt := inf min {55,a,t+i(f, f^)} {Et[xt+i{a)] -Et[xt+i{b)]) > 0, 

fG5t,o-GA(yl) a&A*,beA\A* 

for all t G No. 

The proof of the last lemma, which follows from (jlip . is trivial, and thus, is omitted. In the specific 
class of monotone learning rules. Theorem [1] and Corollary [1] therefore yield the following corollary. Recall 
that P = {2l(crt)}^gj^y denotes the performance measure. 

Corollary 2. With the notation and assumptions of Lemma{l\ the following holds: 

1. Suppose that X^^q "^t ~ °° '^'^^ -^o > 0. Then, for all e > 0, there exists a slowing sequence 
9 = {^tjtGNo such that ^{P^ = 1) > 1 — e. Here P^ corresponds to the system (11^,21), with 
parameters Al^^^{f,a) := 0t^iAb^a,t{f^^) "'^'^ Bb,a,tif^^) ■= ^t~iBb,a,t{f,cr) for all a,b e A, f e ^t-i, 
a G A{A), and t e N in ^ and (fTO]l . 



2. Suppose that Pq > 0, 



inf min mxt+i{a)] - Et[xt+i{b)]} > 0, 

tGNo a£A*,b<=A\A* 



and, for sake of simple notation, assume that X = [0,1]. Then, for any c G (0,1], the behavioral 
rule with A,a,t(f,'7) = (1 - c)/{t + 1) and -Bfe^a,t(f, o") = c/{t + 1) in Q and ([lOD for all a,b e A, 
f G St~i, cr G A{A), and t G N, satisfies P(Poo = 1) = 1. 
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We remark that the analysis does not require stationarity or independence of the actions' payoffs. As 
long as A* ^ 0, we obtain, for example, optimality almost surely with the rule of the second part of the 
last corollary. Thus, the correspon ding behavioral rule is robust, not only over the short run, as shown in 



Borgers. Morales, and SarinI (j2004l ). but also over the long-run. The individual does not need to know the 
payoff distributions, but can be certain to choose an expected payoff maximizing action in the long-run, 
as long as her behavior is determined by a behavioral rule that gets arbitrarily slow over time. 

4.1.2 Full information 

Now we consider a model of individual learning with full information. Each time the individual observes 
the payoff she obtained and forgone payoffs. We thus assume that is an arbitrary singleton, 5t = 
X X\^\, and nf (f,a) = f X (xt(a))„g^, for aU t G N. 



Borgers. Morales, and SarinI ()2004l ) can 



In this setup, a construction with similar dynamics to those in 
be pr ovided in terms of a class of functions that we have used to describe imitation models in our previous 
work 



Ovarzun and Rug (j2009l ). In the full information setting, the behavioral rule is a function mapping 
the current information state, probabilities of choosing each action, and obtained and forgone payoffs to 
the probability of choosing each action in the next period. Thus, the behavioral rule here is specified as 

U : X\^\ X X A(^) ^ A(^) = 6, 

for alH S N. We then have 

nt(w, f, a) = Lt {{xtia, a;))„e^ , f , a) (13) 

for aU t e N, f € dt-i, a; € Of and a e&. 

We interpret the class of behavioral rules that we analyze here as if the individual was making pairwise 
comparisons between all possible pairs of actions and moving probability from one action to the other. We 
start by specifying a class of functions that are useful to construct such behavioral rules. These functions 
describe how probabilities are switched from one action to the other in each pair-wise comparison. These 
functions are anti-symmetric; in particular, no probability is swapped when the individual is comparing 
two actions that yielded the same payoff. Furthermore, when the payoff of one action is greater than 
the payoff of another action, probability is moved toward the action that yielded the higher payoff in the 
corresponding pairwise comparison. 

Definition 2. We say that a function g : — ?• [— 1, 1] is symmetric-switch if 

1. g{xi,X2) = -g{x2,xi) for all xi,X2 G X, 

2. g{xi, •) is non- decreasing for all xi G X , and 
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3. g{xi,X2) > for all xi,X2 € with X2 > Xi. 

An example of symmetric-switch functions that play an important role in our analysis are functions 
of the type g{xi,X2) = b{x2 — xi) for some sufficiently small scalar > 0. 
Using symmetric-switch functions, we construct the behavioral rule 

Lt{{x{c))^^^,f,a)^ = a{a)+a{a) ^ f7(6)5b,a,t x(a), f, a), (14) 

beA\{a} 

where gb,a,t{'^ ''f'^) = 9a,b,t{'^ ''f'^) • ^ 1] i'^ symmetric-switch, for ah t G N, f G ^t~i, cr G A{A), 
a & A and b G A \ {a}. In order to provide a specific example, consider a model of complete information 
adaptive learning with X = [0, 1], where if the current probabilities of choosing actions are described by 
a G A(j4), then the probability of choosing action a G j4 in the following period is described as 

Lt{{x{c))^^^,f,a)^ = a{a)+a{a) ^ a(6)/3(t, a, 6, f, a) (x(af - , (15) 

b€A\{a} 

with /?(-, •, •, •, •) G [0, 1] and p > 0, for ah a G A, x G A"!^!, f G cr G A{A), and t G N. 

We now provide sufficient conditions for a behavioral rule of this type to yield WBERHR for two 
optimality criteria. 

Lemma 2. Consider two cases: 

1. Let A* be defined as in ([T2]) and assume that gb,a,tixi,X2,f,cr) = Cb^a,t{x2 — xi) for some scalar 
Cb,a,t > 0, for allt eN, f e dt~i, o- G A{A), a ^ A, b e A \ {a}, and xi,X2 G X. 

2. Define 

A* ■.= {aeA: Kt [n(xt+i(a))] > Et [u {xt+i{b))] almost surely ^b^A^t G No, Vn G , (16) 

Suppose that, given 
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where U denotes the set of all bounded, non- decreasing functions u : X — 
the information up to time t, Xt+i(a) and Xf_(_i(6) are pair-wise independent for all a,b ^ A, and 
9b,a,t{--, f, cy) is symmetric- switch, for aZZ t G N, f G ^t~i, cr G A(A), a G A and b £ A \ {a}. 

Then, in any of the two cases, the system (11,21), defined by ([7]), (|13p . and (|14|) . satisfies WBERHR, with 
lower bound sequence 6 = {5t}t£Noj where 

^t-= ^i'^f,., . {'Kt[9b,a,t+i{xt+i{b),xt+i{a),f,a)]} >0, 

for all t G Nq. 



^^In other words, here we define A* as the set of actions whose payoff distribution at time t, given the information up to 
time t — 1, first-order stochastically dominates the payoff distribution at time t, given the information up to time t — 1, oi 
aU other actions, for all t £ N. We remind the reader that a random variable X with cumulative distribution function Fx 
first-order stochastically dominates a random variable Y with cumulative distribution function Fy if Fx{z) < Fy{z) for all 
2 G K. 
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Proof. In the first case, we clearly have that 6t > 0. In the second case, fix a S A* , b ^ A \ {a}, 
t € No, f € 5^, o" € 6, and the symmetric-switch function g{-, •) = gb,a,t+i{'y •> f; c)- By the rule of iterated 
expectations we obtain that 

Et[5(xt+i(6),rEt+i(a))] = Et[E[<7(xt+i(6),xtH_i(a))|J-[o,t] V a(xt+i(&))]] 
> Et[E[g{xt+i{b),xt+imT[o^t]\/a{xt+im] 

= 0, 

where a{xt+i{b)) is the cj-algebra generated by xt+i{b) and xt+i{b) is an independent copy of xt+i{b). 
The inequality follows from the fact that g is non-decreasing in the second component and that xt-\-i{a) 
strictly first-order stochastically dominates Xt+i{b). The last equality follows from the anti-symmetry 
property of g. This yields 6t > 0- 

We proceed by observing that Kt[g{xt+i{b), xt+i{a))] = if 5 G ^* in both cases since then xt+i{a) 
and xt+i{b) have either the same expectation (in the first case) or have the same distribution (in the 
second case). We then utilize ([7]), p3|). and (fH|) to obtain 

Ei[2t(ni+i(f,a))]-2l(cT) = J]a(a) J] a{b)Et[gb ,a,t+l 

{xt+i{b),xt+i{a),f,a)] 

aGA* beA\A* 

for all f € 5> f G S, and t G No, which yields the statement. □ 

Lemma [2] establishes that the system induced by the updating rule defined by (jl4p and the aggregator 
defined with respect to expected payoff maximizing or first- o rder s tochastically dominant actions satisfies 



WBERHR. For applications. Lemma 3 in 



Oyarzun and Ruf 



(120091 ) proves to be helpful in the second case 



when the set of optimal actions is specified in the sense of first-order stochastic dominance. That lemma 
states, for fixed a,6 € A and t € No, that the strict inequality Et[g{xt+i{b),xt+i{a))] > holds if g is 
symmetric-switch and additionally g{xi,X2) > for all xi,X2 G X with X2 > xi, provided that xt+i(a) 
first-order stochastically dominates xt-\-i{b) and their distributions are different. 
Theorem [T] and Corollary [T] then yield the following corollary: 

Corollary 3. With the notation and assumptions of LemmalM the following holds: 

1. Suppose that X^^o "^t ~ ^^'^ -^o > 0. Then, for all e > 0, there exists a slowing sequence 
9 = {^tjtgNo such that '^{P^ = 1) > 1 — e. Here P^ is the performance measure of the system 
defined by ([7]), (fT3|) . and (fT^ . with symmetric- switch functions dhati'i'^f'^) • "^"^ ~^ [~^^^' such 
that gl^^^ = Ot-igb,a,t for all a,b £ A, t e N, f e d and a e &, in p^ . 

2. Suppose that Po > and, for sake of simple notation, assume that X = [0, 1] . Then, for any 
c G (0, 1], the behavioral rule in (jl5p with I3{t, •, •, •, •) = c/{t + 1) for all t (zN and, in the first case 
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of Lemma\^ with p = 1, and in the second case, with arbitrary p > 0, satisfies P(-Poo = 1) 
provided that 



inf min 

tSNo aeA*,beA\A* 



{Etixt+iiaf - xt+i{b)P]} > 0. 



In particular, if we assume that Pq > and that the payoffs are independent and identically distributed 
over time, that is, ixt{c))^^j^ and (a^;:(c))^g^ are independent and identically distributed for alH G N and 
t G N \ {t}, then the behavioral rule in (llSp with p = 1 yields optimality with very high probability if 

•, •, •, •) is constant and sufficiently small, or almost surely if •, •, •, •) = l/{t + 1) for all t G N. 

There are well known learning procedures with full information about p ayoffs that converg e to choosing 
exp ected payoff maximizi ng actions. One of such procedures is provided by lRustichinil (119991 ): it relates to 



the 



fictitious play (see, e.g.. 



Roth and Erevi (119951) model of individual learnin 



g that we describe below. Another such procedure is 



Fudenberg and Levind (jl998l )). In contrast to the model outlined above, neither 



Rustichini's model of learning with full information, nor fictitious play, satisfy that the probability of 
choosing optimal actions is a submartingale. Hence, these models cannot satisfy BERHR or WBERHR. 



4.1.3 Roth-Erev learning model 

In the previous subsections we have used Theorem [1] and Corollary [1] to provide sufhcient conditions for 
achieving optimality either with high probability or almost surely. In this subsection, we show that the 
almost-sure optimality of Roth and Erev's behavioral rule can be derived from the slowing-down argument 
developed in Theore m [2l This learning ru le has been widely used to describe learning in experimental 



economics (see, e.g. 



Roth and Erev 



places in the literature (see 



Beggs 



19951 ')') and its convergence proper ties have been studied in several 



20051) and 



Hopkins and Posch 



20051 ). and the references therein). 

1^1 



Roth and Erev's behavioral rule can be associated to an information space 5 = 5j = R^^l for all t G Nq. 
An element in this set is typically interpreted as a vector of current "attractions" corresponding to each 
action, and the probability of choosing each of them is proportional to its attraction. The updating rule 
for information, H^^ = {nf}^^^, is given by Uf (f, .) = f + EaeA l{...F.(a)e. for ah t G N, f G 5, and 
o" G ©, where is a unit vector in M}^^ with a 1 in the a*'* position and zeros elsewhere. This means 
that only the attraction of the chosen action is updated, and it is increased in an amount equal to the 
obtained payoff. Let Vt := XlaeAfa,* t G No denote the sum of attractions at time t G Nq. 

The updating rule of Roth and Erev is defined by the time-invariant behavioral rule L : A x X x ^ ^ 
A{A) with 



L{a,x,f)a 



and L{a, x, f)b 



SceA fc + X YlceA fc + X 

for all a (z A, b (z A \ {a}, x ^ X, and f G 5^. The updating rule of this model is thus defined by 



nt(a;,f, a) = L {at{uj,a),xt{at{uj,a),uj),f) 
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for all t € N, € f^j , f G dt~i and o" € S. We shall assume that the set of payoffs X = [xmm, a;max] satisfies 
a^min > 0. The set of optimal actions we consider here is the set of expected payoff maximizing actions 
A* defined in mW) . 

I — \r\ I — I 

Beggsl (|2005l ) provides a thorough analysis of the iRoth and ErevI (|1995l ) model of individual learning 



with partial information. Here we recover and slightly generalize his convergence result^ using a different 
argument and hence, provide a different interpretation of the convergence properties of this learning 
model. The proof we provide here follows from Theorem [2] and can be found in Appendix iBl Hence, our 
argument is based on the analysis of the properties of the expected relative hazard rates of this learning 
model. 

Corollary 4. Suppose that there exists an almost surely strictly positive random variable e with 

inf min {M^t+i{a) - xt^i{b)]} > e > 0. (17) 

tGNo aeA*,beA\A* 

If an individual makes choices according to Roth-Erev 's behavioral rule, then Poo = 1 almost surely; that 
is, the individual will play, in the limit, almost surely an optimal action. 

We remark that, apart from (jl7p . we have made no assumptions on either stationarity or independence 
of the actions' payoffs. Finally, we also remark that although we have explicitly specified the system 
defined by this learning model, our convergence result is based on Theorem [21 so the proof of Corollary U] 
is only based on the analysis of the performance measure. 

4.2 Social learning 

In this subsection, we allow individuals to observe other individuals' choices and obtained payoffs. This 
may result in individuals imitating the individuals they observe, thus allowing learning by imitation. 
We formally describe imitation using a behavioral rule {L|*^}tgN that we decompose in two behavioral 
components. The first of them, which we call the imitation component, corresponds to how the information 
in the current sample of observed choices and payoffs affects the individual's behavior. In particular, this 
component represents individuals' drive to imitate what others do. The second component of behavior 
corresponds to the probabilities of choosing each action in the previous period. These probabilities only 
reflect past information the individual has observed and hence may be interpreted as a purely inertial 
behavioral component. 

We assume that at time t = 1, each individual i W chooses each action with exogenously given 
probabilities a^ '^ G ^(^)- In the following periods individuals are allowed to imitate other individuals. 
We assume that at time t + 1 individuals make choices through imitation with probability Xt-i € [0, 1], 
for all t € N or, otherwise, they choose each action with the same probability as in the last period. 



*E.g., 



Beggi 



(120051) shows his result under the additional assumption that Xmin > 0. 
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We call A := {Afl^g^j^^ the imitation rate and we assume is J^[o,f] -measurable and is the same for all 
individuals. The behavioral rule's imitation component describes the probability of imitating each of the 
other individuals (or simply repeating her own choice in the previous period). These probabilities are 
determined as a function of the new information the individual receives about the chosen actions and 
obtained payoffs in the population. 

(i) 

In particular, individuals have a common information updating rule such that ™ arbitrary 

singleton, and ^['^ = dtli x (uL=l ''Wi x A'' x Af^) for alH e and t £ N, where ''Wi is the set of all 
subsets of W that contain individual i and k — 1 other individuals, for fc€{l,...,|VF|}. Thus, our model 
allows an individual to observe any number of individuals in the population, the actions they chose, and 
the payoff they obtained. The set of all possible "aggregated" information accumulated up to time t € No 

(i) 

is given by 5t = x m and the information updating rule is assumed to be 

nf(.,u) = \f> X f,«(.),„<'!"'"»(.,.).4'!"'"" 

for all t G N, a; € 0,t, f G dt-i, and a £ 6. Thus, the information updating rule reflects all the information 
revealed to each individual up to each point in time. 

In order to define the updating rule of this model, we begin by specifying the imitation component 

^(i) 

L\ of individual i's behavioral rule, a function defined on 

(\w\ \ 
■= \\J^WiX X X X A(^) 

ii) 

for all i S ly and t G N. Therefore, Oi represents all the possible new information individual i may 
receive at time t, along with her previous information state and current probabilities of choosing each 
action. The imitation component maps the observation in the information set O] to the probability of 
imitating any of the observed individuals at each time t, and therefore, : of'' — )• A(M^). Thus, the 
probability that i £ W imitates at time t + 1 what j £ W did at time i G N is Xt^iL[^\-)j. We shall 
(explicitly) assume the "must -see" condition YliPs 'i ')j ~ ^ ^'^^ ^ ^ * ^ Thus, we 



assume that individuals may only imitate t 



the literature (see, e.g., 



Cubitt and Sugden 



l e indi viduals they observe. This is a standard assumption in 



We define the behavioral rule of each individual i G W, L^^ : O^f^ — t- A(A), which represents her 



probability of choosing each action at time t + 1, in a two-step procedure. First, we map the range A(l^) 
of the imitation component llf^ to A (A) by assigning to each action a G ^ a probability of being chosen 
that is equal to the sum of the probabilities of imitating each of the individuals who chose a at time i, 
which corresponds to Xt~\^,-^^.^(i)^^^'t i^t 

',ar ',xp ',ff^^,al ')j, for all t G N. Then, we add the 
probability of choosing a, according to their initial behavior (1 — Xt^i) a^^\a). Thus, for each individual 
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i G W, 

L«(s,a(^),x(^),f»,a«). = A*„i Yl Lf\s,a^^\x(^\f\a^% + (1 - Ai„i)a»(a) (18) 

for all a € A, {s,a^'\x^'\f\a^^'>) G of^ and t € N. 

We now can specify the updating rule of this model at each time t € N as a collection of individual 
behavioral rules: 

M^M = {l? (^Piu),a^^'^%,a),xf^^^ [af^-^\.,a),a?l >f«,-«)}^^^ (19) 

for ah t G N,a; G Qt,f G dt-i, and tr G 6. 

The probability that i ^ W chooses an optimal action at time t + given the information up to t, 
is denoted by pj^''^ := X^^g^. cr|'^(a). Thus, the performance measure, i.e., the expected fraction of the 
population who chooses optimal actions, given the information up to the moment before choices take 
place, is given by Pt := 1/\W\ J2iew aggregator of the system is given by ([7]). 

We consider several possibilities regarding sampling. Yet, all these possibilities satisfy two properties 
that seem hard to dispense with and which role in the sequel is discussed below. Let Pt\s) be the 
probability that individual i observes sample s at time t + 1, for all i G W , s G V{W), and t G Nq, 
where we assume that p^i\s) is ^"[0 jpmeasurable and pf\s) = for all s C \ {i}. We say that the 
sampling process is symmetric if each individual in a subset of individuals observes this subset with the 
same probability, i.e., p(\s) = pf\s) for all s G S{i,j) := {s' G ViW) G s'}, i £W, j e W\{i}, and 
t G Nq. We say that the sampling process satisfies observability (with lower bound i^) if the probability 
that any individual observes any other individual is positive, to wit, if there exists a positive random 
variable ^ > such that J2ses(i,j) Pt\s) > ^ foi" all t e'Mo, i e W, and j eW\ {i}. 

The next lemma establishes that a system satisfies WBERHR if sampling is symmetric and observable, 
and an individual who chose a non-optimal action is more likely to imitate an individual who chose an 
optimal action than vice versa, whenever both of them observe the same sample. 

Lemma 3. Define the random variables 

6t := min min min inf 

i&W,jeW\{i} seS{i,j) a('=)eAl»t:aW6A*,a(j)GA\A* (fW ,f(j))e(5^"' x^^') ,(aW ,a(:'))e(A(A))2 



(20) 



for all t £ Nq. Suppose that sl^-^ is conditionally independent, given the information up to time t, of 



{x^t+i}i^W for all j e W and t G Nq 



15 



Suppose, furthermore, that sampling is symmetric and observable 



(with lower bound > 0) and L satisfies the must-see condition. 



^^The proof also relies on the fact, guaranteed through the setup of the underlying randomization device, that the choice 
of actions {a['_l-^}i^w is a set of independent random variables, that is moreover independent of everything else for all t £ No- 
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Provided that 5t > for all t € Nq, the system (11,21), defined by ([7|), (llSp . and (I19p . then satisfies 
WBERHR with lower bound sequence 5 = {5t}tmf) given by 6t = Xt{\W\ — for all t € Nq. 

The proof of this lemma is provided in Appendix [Bl 

We, therefore, can provide a bound in the expected change in the fraction of the population who chooses 
an optimal action in terms of the imitating probabilities among individuals. The expected net switch of 
probabilities of choosing each action is determined both by the information that becomes available to 
individuals and how their behavioral rules transform it in updated probabilities. 

Let 

for alH S N, z € W, j (zW\ {i}, s S S{i,j), and x^^^ € X^^^; we assume, for sake of simple notation only, 
that gt,i,j,s does not depend on a^^\ f^^\ a^^\ f^^\ or a^^\ 

In order to guarantee that 6t > in Lemma [3] is satisfied, we impose further restrictions on how the 
imitation components respond to payoffs. The following lemma provides a set of conditions. 

Lemma 4. Suppose that the distribution of x[^\a) does not depend on i ^ W for a// t G N and a ^ A 
and consider two cases: 

1. Let A* be defined as in (|12p and assume that gt,i,j,six^'^'^^^''^^\ ■) is symmetric-switch, and linear and 
non- decreasing in xj for allten,ie W, j eW\{i}, s € S{i,j), and x^'^ eX^'^. 

2. Let A* be defined as in ()16p . and assume that gt^ij^s{x^'^'^^^'''^\ ■) is symmetric-switch for all t G N, 
i € W, j (z W \ {i}, and s € S{i,j). Moreover, suppose that, given the information up to time t, 
{xf^i}i£W independent for allt £ Nq. Alternatively, suppose that gt^ij^s can be written as a sum 
of functions that only depend on pairs of payoffs {ix^^\x^''^)^.^^^ for all x^^^ € X^^^, in which case 
we only require pairwise independence of {x^l^^}i£w 

Then, in any of the two cases, we have 5t > for all t € No for 6t defined in (j20p . 

Proof. The first case is clear. Now fix t G N, i G W, j £ W \ {i} and s G S{i,j) and consider the second 
case. First, condition on x^^Y*'"'"'^^ and then proceed as in the proof of Lemma [2] to obtain the statement. 
Under the alternative assumption directly use the ideas of the proof of Lemma [2j □ 



This result reveals that symmetric and observable sampling yield that obtaining positive relative haz- 
ard rates depends on the behavioral rules, the payoff distributions and the criterion to define the perfor- 
mance measure. Here, we considered expected payoff and first-order stochastic domina nce, but other cri- 



teria a re possible. We remind the reader of Subsection 14. 1.2f s discussion of Lemma 3 in 



Oyarzun and Ruf 



(|2009l ) that yields sufficient conditions for the strict inequality 6t > 0. 
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An example of a behavioral rule that satisfies either condition in Lemma [4] is given by 

L« (s,a^^\x^^\f\a^'A . = ^— and L« (s, a^, x^^), f», a») . = 1 - ^f'^^'^ (21) 

for aWt e n, i £ W, j e W\ {i}, s G S{i,j), a^'^ G ^1^1, x(^) G A"!"!, f« G dfli, and cj« G A(^). 
Another example that satisfies Condition 2 in Lemma H] and does not impose linearity of the behavioral 
component on the observed payoff of the sampled individuals is given by 

L? (.,aW,xW,f»,a«) . = ^M^^ with := ^ (22) 

for all t e i,j £ W, s £ S{i,j), a^'^ G ^1^1, x^''> G A!^'^, G and a^^ G A{A), where 

/ : — )• [0, oo) is any non-negative and non-decreasing function; e.g., f{x) = x. The specification of such 
an imitation component resembles that of the Roth-Erev model of individual learning. In that model, 
the probability of choosing each action is proportional to the cumulative reinforcement, determined by 
the payoffs the individual has received over time with each action. Here, the probability of imitating each 
other sampled individual is proportional to the payoff that such an individual received and hence, the 
probability of choosing the corresponding action through imitation is proportional to the sum of payoffs 
it provided to the sampled individuals who have chosen this action. 

We now can apply Theorem [T] and Corollary [T] to provide sufficient conditions to achieve optimality 
with high probability or almost surely: 

Corollary 5. With the notation and assumptions of Lemmas and^ the following holds: 

1. Suppose that X^^q "^i ~ °° '^'^^ -^o > 0. Then, for all e > 0, there exists a slowing sequence 
9 = {^tltgN,, (constant if init^fq^ St > 6 a.s. for some constant 6 > 0) such that F{P^ = 1) > 1 — e. 
Here, is the performance measure of a system defined by ([7]), (jlSp . and ()19p with Xt replaced by 
Oth for all t G Nq. 

2. Suppose that Pq > and 

inf min {E* [/(xt+i(a)) - /(xi+i(5))]} > (23) 

for either f{x) = x or any nonnegative, non- decreasing and bounded function f on X . Then, with 
Xt = l/{t + 2) for all t G Nq, the performance measure P of the system (n,2t) defined by ([7]), (llSh . 
and ^ or ^ satisfies P(Poo = !) = !. 



The condition in (]23p is satisfied if the payoffs are independent and iden tically distributed over time 



either trivially if f{x) = x, or due to Lemma 3 in lOvarzun and Rug (j2009l ) if / is additionally assumed 



to be strictly increasing, similarly as in Subsection I4.1.2[ 
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When we impose that each individual only observes one other individual, i.e., '^j^iy^^ij pf\{hj}) = 1 
for alH € VF, the imitation co mponent of the behaviora l rules in this section correspond to the first-order 

(|2009l ). When we further assume that the sym metric- 



monotone behavioral rules in 



Ovarzun and Rul 



switch functions are linear, the class of imitation components of the behavioral rules contains 



Schlag 



(|l998l ) improving behavioral rules. When we impose that each individual observes two other individuals, 
i.e., X^jg^yyjj} fcgiy\{j j} Pt*H{^5 Ji ^}) = 1 for ^ S W, and the symmetric-switching functions are l inear, 



Schlad (|l999l l. We 



the imitation component satisfies the characterization of strictly improving rules in 
are not aware of any paper in the literature that provides implications for the relative hazard rates for 
the other possibilities of sampling that we allow. 

In contrast with previous convergence results that rely on populations that are a continuum, our 
analysis reveals that behavioral rules with an improving imitation component, as described in Corollary [U 
yield optimality almost surely even for finite populations. As the second part of this result reveals, adding 
the extra assumption that the weight of the imitation component of behavior decreases linearly over time 
is enough to obtain optimality almost surely. 



5 Discussion 

Our analysis of systems that satisfy WBERHR or BERHR can be the starting point for the study of slightly 
more complex dynamics. There are many other models in the literature with similar characteristics 
to those considered here that do not satisfy these pr operties. One example is the model of word-of- 



mouth social learning in 



Ellison and Fudenberd (jl995l ). In their model, individuals sample n € N other 



individuals out of a continuum population and choose the action that has the highest average payoff in 
their observed sample. Aggregate shocks (on top of individual specific shocks) of the payoffs yielded by the 
two available actions allow for randomness despite of the population's cardinality. For n = 1, their model 
satisfies BERHR, and hence, their findings are recovered by our results. In particular, the population may 
"herd" to the action with the lowest expected payoff with positive probability, and this probability goes 
to zero when there is enough inertia (which is equivalent to slowing down in our analysis). For n > 2, 
however, their model does not satisfy WBERHR and thus our results tell us nothing about the asymptotic 
properties of their model. Future research could study related conditions on these systems that make it 
possible to analyze the asymptotic properties of the models in a general framework encompassing their 
findings for these cases. 

Another possible extension is the study of properties of systems that satisfy (W)BERHR in games. 



Beggsl (|2005l ) shows that the Roth-Erev learning model leads individuals to converge to play with zero- 
probability actions eliminated by iterated deletion of dominated strategies. It does not seem that our 
analysis could be extended in a straightforward manner to this context. This is a topic that deserves 
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further attention in the future. 

The trade-off between speed and the probabihty of achieving opti r nahty is of particular interest in the 
hterature of learning automata (see, e.g., iNarendra and Thathachaii (|l989l )) and hence, wort h of further 



Rothschild 



study . An approach is provided by the classic analysis of multi-armed bandits (see, e.g., 
(|l974l )). Since in that analysis (intertemporal) preferences are exogenously given and intertemporal 
discounting is assumed to be geometric, this provides only a particular manner of dealing with this 
trade-off. 

In our analysis of the properties of the dynamics of choices in social learning models, our sampling 
assumptions may seem restrictive in some setups. For instance, observability may rule out network 
structures in which individuals m ay sample some other individuals in the network with zero probability 
(see, e.g 



Bala and Goval 



(|l998l )). It is intuitive, however, that the choices of individuals who are not 
sampled may be observed, after a number of periods, provided that there is a path of individuals connecting 
the individual who chose one action and another who chooses that action later via imitation. Analyzing the 
dynamics of the optimality measure in such structures would require developing further the constructions 
we have provided above. We leave this for future research. 



A Proofs of the convergence results in Section [3] 

In this appendix, we provide the proofs for the statements of Section [3j We start with some auxiliary 
lemmas. 

Lemma 5. If the system (11,21) satisfies WBERHR with lower bound sequence 5, then (n^,2t) satisfies 
WBERHR with lower bound sequence {OtSt}teNo- 

Proof. From the definition of the WBERHR property and the linearity of 21, we obtain 



21 n^+i(f,a) -2l(a)=Et 21 Hf+i (f , a) - a = ^^E^ [21 (B^+i (f, a) - a)] > 0*5*21 (a) (1 - 21 (a)) 



for all f G cr G e and t G Nq. 



□ 



Lemma 6. // the system (11,21) satisfies WBERHR with lower bound sequence 6 = {5t}teNo md if 
Yl'^o ~ -^oo G {0, 1} for all fo G and do G 6. 

Proof. Assume that Pf does not almost surely converge to either or 1. Then, there exists an e > such 
that P(nmt-|.oo Pt{l - Pt) > 2e) > 2e. Thus, there exists a to e N such that the event B := {Pt{l - Pt) > 
e for ah t > to} satisfies ¥{B) > e. 
By the hypothesis, 

Et[Pt+i] -Pt> dtPtil - Pt) 
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for all t e No; thus, 

1 > E[Pt] =Po + Y^E [Pr+l - Pr] > jy [5rPr{l " Pr)] > E - P.)] ^ OO, 



t-1 



t-1 



t-1 



r=0 r=0 

as t tends to infinity, leading to a contradiction. 



T=to 



□ 



Lemma 7. // the system (H, 2t) satisfies ([T|) for some f S f S 6, t G No, and J^^Q ij-measurable 6t > 
then 

/or a// {^-measurable 7 S [0, 1 A 5^] ami T[Q f;^-measurable Ot G (0, 1]. 

Proof. We only need to show the statement for 7 > 0. Thus, we assume, without loss of generality, to be 
in the event {5t > 0}. We define the function : [0, 1] M for aU 7 G (0, 1 A 5t] by 

1 - e-"!' 



GJz) := z + 5tz{l - z) 



1 - e-T 



and observe that 



—G^{z) = -25t + -^— 
oz^ I — e ^ 



< -26t + 



7 



7 < 2(7 - St) < 



1 - e-T 

since 7/(1 - e"^) < 2 for ah 7 G (0, 1 A 6t] and that G^(0) = G^(l) = 0. Therefore, G^{z) > 0. Similarly, 
we see that 



1 _ 

GJz) :=z < 



(24) 



for all z G [0, 1] and 7 G (0, 1], which yields that 

'1 _ g-72l(nt+i(f,a)) 



1 - e-T 



>Et mUt+i{f,a))]>G.,ma)) 



1 _ e-72iW 1 _ g-'rSlH 



1 - e-^ 



> 



1 -e-T 



where the first inequality comes using ([2^ with z = ^(Ilt+i{f,a)) and the second inequality from using 
p. This yields 



E. 



^(2l(nf+i(f,<T))-2l(a))' 



e "t 



E, rg-7(a(nt+i(f,<^))-2i(<T))l < 



which proves the statement. 



□ 



Now, we provide the proof of Theorem [1} 



Proof, (of Theorem [T]). Fix the smallest integer 7 = 7(^0 such that e~'^^° < e. We define the slowing 
sequence Q by Qt ■= (St A l)/7 < 1 and observe that "^^QOt^t = 00. As 7]t is allowed to be one, it is 
sufficient to show the rest of the statement for the slowing sequence rjO for some sequence rj as in the 
statement. Towards this end, define the process M = {Mt}teNo as Mt:= e-^^t for all i G Nq. 
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We start by observing that M is a supermartingale since, for fixed t € No, we have that 



t+ij 



Et 



< e «t"t ' = Mt 



where the inequahty follows from Lemma [71 Thus, Mj converges to some random variable M^q € [0, 1] 
and we obtain 

P (P^^ = l) = E[P^] > 1 - E[Moo] > 1 - Mo > 1 - e, 

where we have used Lemmas [5] and [6] in the first equality. This yields the statement. □ 

The proof of Theorem [2] is similar: 

Proof, (of Theorem [2]). Fix e G (0, 1) and observe that there exists a constant 6 G (0, 1) such that the 
event A := {6 > 5} C {mintgNo{^t/6't} > satisfies F{A) > 1 - e/2. Define y := -log(e/2)/(5 > and 
the process M = {Mt}teNo ^ 

^—Pt 

Mt:=e W l{min„g{o....,n{'5"/e"}>5} 

for all t € No, where the stopping time p is given in ([6]). As in the proof of Theorem [T] we start by showing 
that M is a supermartingale. Towards this end, fix some t G No and assume, without loss of generality, that 
we are on the event {min„g|o^...^t}{(5„/^„} > 6}. Define now Pt = Pt and Pt+i = Pt + {Pt+i — Pt)/9t G [0, 1] 
and observe that P^+i — Pt > SPt{l — Pt)- We then obtain that 



Et [Mt+i] < Et 



g "(t + ljAp 



<Et 



Pt + 1 



Et 



ft 



Pt 



Mt 



where we have used Lemma [71 in which we interpret [Pt,Pt+i) as the "slowed down version" of {Pt, Pt+i)- 
Thus, as in the proof of Theorem [H Mt converges to some random variable Mqo G [0, 1]. 

As p < oo almost surely by assumption, we obtain from a similar argument as in Lemma [6l that 



(Poo = 1) = E[Poo] > 1 - E 

> P(^) - E 



e "p 



e "p 
P. 



> 1 - E[Moo] - r{A^) > ¥{A) - E[Mp] 
e 



> 1 - - - e-^y 

2 



1-e, 



similarly to the proof of Theorem [H where A'-' := n \ A. As e was chosen arbitrarily, we obtain the 
statement. □ 

The next lemmas are useful in the proof of Corollary [H 

Lemma 8. For a system (H, 21) with corresponding performance sequence {Pt}t&k^, we have P^ > Po/(i + 
1) for all t G No for the slowing sequence 6 defined by 6t = l/{t + 2). 
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Proof. Assume that we have shown Pf > PQ/{t + 1) for some fixed t G Nq. Then 



t+i 



P + 



t + 2 



21 n^+i f?,a: 



pn>Pt 



p? it+m' 



t + 2 



t + 2 



> 



Pq 
t + 2' 



and the statement foUows by induction. 



□ 



Lemma 9. Let the system (11, 21) satisfy WBERHR with performance sequence {Pt}te^o dn-d Pq > and 
let 9 denote the slowing sequence defined by 6t := l/{t + 2) for all t € Nq. Fix y > and define the 
stopping time p as 



p := min <^ t e No : P" > 



t + 2 



with min( 



oo. 



(25) 



Then P(p < oo) = 1. 



Proof. We shah show the statement with y in (j25p replaced by NPq, where N = [y/Po] + 1 with [•] here 
denoting the largest integer smaller than the argument. Define := N^'' — 1 for all k € Nq. Since 
P(p > t) is non-increasing in t, it is sufficient to show that P(p > N/^) — )• as A; — ?> oo. This follows if 
there exists some constant c G [0, 1) independent of k such that 



pi := ¥{p > Nk+i) < c¥{p > Nk) =: cpo 



(26) 



for all A; € N. Towards this end, Lemma [8] and the submartingality of {-ftjjgNo yi^ld 



E 



P 



+ E 



< 



NPo ^ ( NPq 1 



Ar2(.+i) + l- ' \N^^ + 1 ' N^k + ^ J (Po -Pi) < ^ + [N + iPo-Pi] 



since Pt^i < Pf + (^t and po > pi. Sorting terms, this yields 



and, thus, ([26] 



□ 



B Proofs of the results in Section |4] 



The following simple observation, which is closely related to Lemma 2 in iBeggd (120051 ) . will be useful in 
the proof of Corollary 

Lemma 10. If, in the setup of Suhsection \4^T^ for some a € A, we have Et[a:;t+i(a)] > e almost surely 
for all t G No, for some almost surely strictly positive random variable e, then lim^ioo fa,i = oo. 
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Proof. We observe that the probabiHty of choosing action a at time t is bounded from below by fa.o/(^ + 
{t — l)xniax)- Thus, an apphcation of the Borel-Cantelh lemma yields that action a is chosen infinitely 
often, say at times ri < r2 < set tq := 0. Consider now the martingale M = {M„}„gNQ with 
Mn = fa,r„ " E'U^n^ii^nia)] for ah n € Nq. Then, 



hl^ = Mll + lYl > ^ + i ^ [e] ^ + 6 = e > 

n n n n 

i=l i=l 

as n 'I" oo due to the Strong Law of Large Numbers for martingales in 



Chowl (jl967l ) applied to the first 



term and using the Martingale Convergence Theorem and Cesaro mean for the second term. This directly 
yields the statement. □ 



We now provide the proof of Corollary [3J 
Proof, (of Corollary H]) . We assume, without loss of generality by Lemma [TOl that 



V Xn 



(27) 



We observe that 



L(a, X, f)a = cr{a) + (1 - a{a)) 



L{b,x,f)a = o-(a) - fj(a) 



X 



a 



(a)+ T 

b&A\{a} ^^6^ 



+ X' 



with cr(a) := fa/X^ceAfc fo'^ aW a ^ A, h ^ A \ {a}, x E Af, and f € 5^. Therefore, with the obvious 
definition for cr^, we obtain that 



¥.t[Pt+i]-Pt = ^t 



j;c7t(6)L(6,Xi+i(6),fi 

.asA* beA 



aGA* 



aeA* 



at{a){l - at{a)) 



Xf+i(a) 
T4 + xj+i(a) 



aGA* bGA\A* 



feGA\{a} 
Xj+l(a) Xt+i(6) 



Vt+xt+i{a) Vt + xt+iib) 



Vt + xt+i(6) 
>Pt(l-Pt)5t, 



where (5+ is defined as 



5t ■= min {(^t(a,6)} := min 

aeA*,beA\A* aeA*,beA\A' 



xt+i{a) xt+i{b) 



Vt + x^ 



V, 



for all t G Nn. We notice that 



5t{a,b) 



1 



> 



Vt + Xmax 
1 

Vf -|- Xjnax 



^t[xt+i{a) - xt+ii 



'-¥.t[xt+i{b)] 



Vo J ~ 2(Fo + (t + 1) Xmax ) 



> 
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for alH e No, a e and b e A\A* due to ([27]). This inequality yields also Y.'^o St = oo. 
We set 9t := Xmax/Vj € (0, 1) for all t G Nq and observe that 

e V, e 



:5i> 



for some strictly positive random variable 6. Furthermore, 
PtVt 



> 6 



P, S^max ^ 
t ^ 



-Pt< Pt+l -Pt< -P^= ^t+l(Qf+l)(l --Pt) ^ -Tmax^^ _ 

Vt "Vi + Xmax ~ ~ Vt + xt+i{at+i) Vt + xt+i{at+i) ~ Vt 



and thus, Pt + {Pt+i — Pt) /^t € [0, 1] for all t G Nq. Therefore, Theorem[21 in conjunction with Lemma [TUl 
yields the result. □ 



We now provide the proof of Lemma [3j 
Proof, (of Lemma [3]) . First fix t G Nq and observe that 



MPt^^\-P^ = W\^{^' 



t+1 



-p 



ij) 



jU) (, As) Js) (j) (j) 



i,j&w ses{i,j) 

since X^jgvK ')« = 1) ^i+i assumed to be conditionally independent of {a[^^^}i^w and 
{xf^i}ieW for all j G W, and •)j = for all s ^ S{i,j) for all i,j G by assumption. 

Observe now, by the assumption (on the randomization device) of independence of the choice of actions 



{ai+iyieW and the payoffs {x|^;^}igvF, that 



i,j&w ses(i,j) 



fU) I , AS) is) A3) i 



is) Js) fU) Ji) 



,0) 



= E E >^'i'> 



E 



is) 

't+1 



hj&W seS{i,j) a(''')&A\''\:a(^^£A\A*,a(J)£A* 

E E p?(^) E iPt(«a = «^^^ 



f(j) As) As) Aj) ij) 



r (i) ("^ fOO Jj) 

^t+1 I •^i " ' "''i+li It ' ''t 



a(s) eAl"! :a(*) gA* ,a(j) eA\A* 



where in the last step we first exchanged i and j and used the assumption that Pt\s) = Pi\s) for all 
s G S{i,j) and the definition of Sf. Due to the independence of {oj^^jigvi/ imposed by the randomizati 
device we have 



ion 



E 



't[a, 



As) 



a(«)eAl = l:a(')GA*,a(j)eA\A* 



a(»)eA*,a(j)GA\A* 



't[atii = a' 



i^)) . 



AJ) 



P« (1 - 
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for all i and j G \ {i}. This yields that 

E E pt\^) E F.(aa = a(^))= E E f^'(^) 

> c E ^^^^ (i - 

iGVK,jGiy\{i} 

>C\W\{\W\-l)Pt{l-Pt), 

where the first inequahty fohows from the assumed observabihty and the second inequahty from the 
fact that X]jGiy(-^t*'*)'^ — l^l-^t^' which is imphed by Jensen's inequahty. The statement then fohows 
directly. □ 
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